Estimation of the probability distributions of stochastic context-free grammars from the k-best derivations
نویسندگان
چکیده
The use of the Inside-Outside (IO) algorithm for the estimation of the probability distributions of Stochastic Context-Free Grammars (SCFGs) in Natural-Language processing is restricted due to the time complexity per iteration and the large number of iterations that it needs to converge. Alternatively, an algorithm based on the Viterbi score (VS) is used. This VS algorithm converges more rapidly, but obtains less competitive models. We describe here a new algorithm that only considers the k-best derivations in the estimation process. The experimental results show that this algorithm achieves faster convergence than the IO and better models than the VS algorithm.
منابع مشابه
Learning of Stochastic Context-Free Grammars by Means of Estimation Algorithms and Initial Treebank Grammars
The use of the Inside-Outside (IO) algorithm for the estimation of the probability distributions of Stochastic ContextFree Grammars is characterized by the use of all the derivations in the learning process. However, its application in real tasks for Language Modeling is restricted due to the time complexity per iteration and the large number of iterations that it needs to converge. Alternative...
متن کاملEecient Disambiguation by Means of Stochastic Tree Substitution Grammars
In Stochastic Tree Substitution Grammars (STSGs), one parse(tree) of an input sentence can be generated by exponentially many derivations ; the probability of a parse is deened as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for nding the most probable parse (MPP) of an input sentence, are not ...
متن کاملConsistency of Stochastic Context-Free Grammars From Probabilistic Estimation Based on Growth Transformations
An important problem related to the probabilistic estimation of Stochastic Context-Free Grammars (SCFGs) is guaranteeing the consistency of the estimated model. This problem was considered in 3, 14] and studied in 10, 4] for unambiguous SCFGs only, when the probabilistic distributions were estimated by the relative frequencies in a training sample. In this work, we extend this result by proving...
متن کاملStatistical Properties of Probabilistic Context-Free Grammars
We prove a number of useful results about probabilistic context-free grammars (PCFGs) and their Gibbs representations. We present a method, called the relative weighted frequency method, to assign production probabilities that impose proper PCFG distributions on finite parses. We demonstrate that these distributions have finite entropies. In addition, under the distributions, sizes of parses ha...
متن کاملComputation of the Probability of the Best Derivation of an Initial Substring from a Stochastic Context-Free Grammar
Recently, Stochastic Context-Free Grammars have been considered important for use in Language Modeling for Automatic Speech Recognition tasks [6, 10]. In [6], Jelinek and Lafferty presented and solved the problem of computation of the probability of initial substring generation by using Stochastic Context-Free Grammars. This paper seeks to apply a Viterbi scheme to achieve the computation of th...
متن کامل